Access and Representation
نویسنده
چکیده
Health related research is an interdisciplinary, broad and growing research area. With the growth of digitalised systems that simplify and make work processes more efficient in many companies and organisations, the amount of available data is now immense. The information contained in health related digital data sets could be used for further research and also, in the long run, for improving health care, health care processes, and public health. A large amount of the information contained in these data sets is often in unstructured, free text. Health related texts can comprise various types of text, such as scientific articles, questionnaire answers, (electronic) health records, information on web sites, and e-mail. What these texts all have in common is above all the use of a domain-specific vocabulary. Information access methods applied to textual data require a language model. Many human language technology tools have been developed in order to improve and simplify representation models, primarily for English, and predominantly for general language use. For Swedish, several human language technology tools have been developed. How these tools work on domainspecific data such as health data, is still a relatively unchartered research area. We have investigated what properties the language use in Swedish electronic health records have compared to a large, general-purpose Swedish corpus, in order to identify if and where adaptation is necessary. We have also created a representation model based on phrases instead of words for Swedish scientific medical text. Health related texts also contain a potentially large amount of previously unknown information, which could be valuable to exploit in further research. We have developed an iterative and interactive method for exploring large text sets, based on document clustering, where both structured and unstructured information is used for generating hypotheses from epidemiological questionnaire data and electronic health records. One of the most important factors that influence the possibilities of performing research on health related data sets is availability. Although digital information is easy to store and obtain automatically, this type of data often contains sensitive and private information that makes it impossible to distribute for further research, unless identifiable information is deleted or replaced. We have initiated work on automatic de-identification for Swedish and created a manually annotated gold standard, which could be used both for evaluating de-identification systems as well as for training new systems.
منابع مشابه
The Effect of Visual Representation, Textual Representation, and Glossing on Second Language Vocabulary Learning
In this study, the researcher chose three different vocabulary techniques (Visual Representation, Textual Enhancement, and Glossing) and compared them with traditional method of teaching vocabulary. 80 advanced EFL Learners were assigned as four intact groups (three experimental and one control group) through using a proficiency test and a vocabulary test as a pre-test. In the visual group, stu...
متن کاملHanding the Microphone to Women: Changes in Gender Representation in Editorial Contributions Across Medical and Health Journals 2008-2018
The editorial materials in top medical and public health journals are opportunities for experts to offer thoughts that might influence the trajectory of the field. To date, while some studies have examined gender bias in the publication of editorial materials in medical journals, none have studied public health journals. In this perspective, we studied the gender ratio ...
متن کاملRepresentation of a nanoscale heterostructure dual material gate JL-FET with NDR characteristics
In this paper, we propose a new heterostructure dual material gate junctionless field-effect transistor (H-DMG-JLFET), with negative differential resistance (NDR) characteristic. The drain and channel material are silicon and source material is germanium. The gate electrode near the source is larger. A dual gate material technique is used to achieve upward band bending in order to access n-i-p-...
متن کاملManifest Destiny and American Identity in Cormac McCarthy’s Blood Meridian
McCarthy scholarship has predominantly tended to stress the writer’s revisionism with regard to his rendering of the myth of the American West in Blood Meridian (1985). McCarthy’s novel has beenmainlyhailed as a critique of the violence of manifest destiny. This study aims to delineate aspects of McCarthy’s narrative which resist the predominant view of him as a revisionist. In this re...
متن کاملThe Representation of Non-Linguistic Sounds in Persian and English Subtitles for the Deaf and Hard-of-Hearing: A Comparative Study
Subtitling for the deaf and hard-of-hearing (SDH) is an area which deserves a special attention as it ena- bles these people to access to the part of the ‘world’ intended for hearing people, including the world of ‘motion pictures’, and particularly movie sounds. Compared to linguistic sounds, non-linguistic sounds have received little attention in the field of translation, although they are in...
متن کاملمقایسه ی کیفیت مستندات پروندههای پزشکی بیماران بستری در بیمارستانهای عمومی دانشگاه علوم پزشکی ایران و تامین اجتماعی شهر تهران : 1386
Introduction: Quality of patients care is directly linked with medical documentation quality, because in all medical professions related to patient care, quality of decisions depends on information quality. Thus, in this study two main populations that offer medical care in country, Ministry of Health (MoH) and Social security Organization, were selected to measure access rate, and level of med...
متن کامل